Introduction

Nowadays, as the socioeconomoic level of people’ lives develops, more and more attention are paid at enhancing personal health and lifestyles. Eating health, as an indispensable component of individual’s biological maintainence, plays a vital role in strengthening personal immunity and preventing chronic diseases, including regular eating habits and nutritional diets. Addressed in “Eating and Health Module” (2019, Economic Research Service), eating patterns are strongly affected by people’s economic status and behaviors, including household income, engagements in physical activities, food preparation styles, accessibilities to grocery stores, and etc. From the aspect of public health and policy, understanding how these indicators affect personal eating health can assist the government to develop effective and efficient food and nutrition assistance programs, increasing stable and harmonized relationship between people and society. This project, starting from these beneficial standspoint, explores impacts of different indicators on personal eating health.

Data

Dataset Information

The dataset is obtained and downloaded from Economic Research Service of United States Department of Agriculture, named “Eating and Health Module”. The dataset is case-specific repsondent file, including general personal biolgical information (height, weight), household information (income, engaged government assistance programs), eating habits (fast food consumption, drink habits, food preparation), and physcial activities of repsondents. The whole data collection was conducted by Questionnaire and Survey, based on answers of voluntary respondents. We mainly started with exploring sample selection of repsondents via several descriptive statistics on their general health information. Then we analyzed associations or distributions of internal or external indicators for understanding their impacts on personal eating health by using persoanl BMI value as the parameter, including accessibilities of grocery stores, conditions of household income, frequency of fast food consumptions and engagements of physical activities. Analysis would include different association plots and multi-linear regression model. We aso established the hypothesis testing for measuring similarity or difference resided between primary and secondary eating behaviors of respondents.

Tidy Process

The raw dataset contains 11212 observations and 37 variables. We selected 10 variables for our project. Variables consists of categorical variables, logical variables and continuous with the validation rule of reporting positive or specified positive value. However, because the dataset is totally based on willingness of voluntary respondent via Questionnaire, there are a large amount of invalid data entries, like negative values as the repsondent was not willing to answer certain questions. Also, for multiple continous variables, like frequency of fast food consumptions or frequency of exercises, data mainly focused on certain integers, resulting that continuous variables are more likely categorical variables.

Because the original dataset contains lots of categorical variables, we firstly recode all categorical variable into corresponding categories following the codebook provided along with the dataset. And for excluding invalid values, the original raw dataset was tidied into desired one based on different statistical analysis.

Statistical Analysis

Descriptive Statistics for Respondents

Association Analysis for different Indicators vs. Eating Health

Multi-regression Model for BMI vs. Frequency of Fast Food Consumption and Exercises

As the rhythm of modern lifestyles expedites, fast food gradually becomes the mainstream of daily meals, mainly consists of high carbohydrates, salts, fats and low celluloses. Long term intake of fast food has been correlated to multiple chronic diseases, such as diabetes, poor nutrition and obesity. Unlike fast food consumption, exercise plays a key role in maintaining and strengthening good physical health. Generally, both two indicators would show different kinds of associations with personal BMI in magnitudes and signs.We built a multi-regression model for exploring association between BMI and frequency of fast food consumption with incorporation of frequency of exercises.

From the scatter plot of BMI vs. Frequency of Fast Food Consumption, originally continuous variables are shown approximately to categorical variables, which might be caused by the way of data collection: Questionnaire. Generally, the regression line of BMI vs. frequency of fast food consumption is positive. As the consumption of fast food increases, the value of BMI would also increase, indicating that fast food consumption relates to decreased eating health. The regression line of BMI vs. frequency of exercise is negative, which means higher frequency of exercise would result in lower BMI. However, according to the R-squared of 0.0042997, the model need further adjustment for detecting associations between BMI and two kinds of frequencies. So, multi-linear regression model incorporating both indicators was built.

Multiregression model of BMI vs. Frequency of Fast Food Consumption and Exercise showed that the estimated coefficient for fast food consumption is 0.1025 with p.value of 0.0125and for exercise is -0.1885 with p.value of 1.326074310^{-7}. R-squared: 0.0085 Both p.value prove the statistical significance of two coefficients in the multiregression model. Low R-squared value allows more residual and outlier tests for the model. Obviously, residuals for the model are deviated from normal distribution. Both scatter plots and Leverage plots show that there are multiple outliers resided in the model, affecting the fitted model for the regression of BMI vs. frequency of food consumption and exercise. Adjustments are performed by controlling existed outliers.

89 outliers are excluded by “internally studentized residuals”. Comparing with unadjusted model, adjusted multiregression model with exclusion of outliers showed lower estimated slope of 0.0922for linear regression BMI vs. Frequency of food consumption with higher statistical significance proved from lower p.value of 0.0095. QQ plot presented that exclusion of outliers made the residuals more approaching to normal distribution. Regression of BMI vs. exercise presented higher estimated slope of -0.232211. R-squared: 0.0153 with increment of 0.0068 Adjusted R-squared: 0.0148 with increment of 0.0068

In general, due to Questionnaire way used on data collection, there would be bias existed while respondents answered frequency of fast food consumption and exercises. From the multiregression model, we can conclude that exercises affects BMI in the positive way which improving personal phyiscal health as the frequency of engaging activities increases. Even though the estimated coefficient of BMI vs.Frequency of fast food consumption is only 0.0922, it is consistent with what we expected before establishing the model that the physical health will decreases as the consumption of fast food increases. In other words, eating health is negatively impacted by fast food consumption but positively impacted by exercises.

Hypothesis Testing

Conclusion